Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: Implementation and early performance measurements
نویسندگان
چکیده
This paper presents results on a communications-intensive kernel, the three-dimensional fast Fourier transform (3D FFT), running on the 2,048-node Blue Genet/L (BG/L) prototype. Two implementations of the volumetric FFT algorithm were characterized, one built on the Message Passing Interface library and another built on an active packet Application Program Interface supported by the hardware bring-up environment, the BG/L advanced diagnostics environment. Preliminary performance experiments on the BG/L prototype indicate that both of our implementations scale well up to 1,024 nodes for 3D FFTs of size 1283 1283 128. The performance of the volumetric FFT is also compared with that of the Fastest Fourier Transform in the West (FFTW) library. In general, the volumetric FFT outperforms a port of the FFTW Version 2.1.5 library on large-node-count partitions.
منابع مشابه
Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer
This paper presents performance characteristics of a communicationsintensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Adv...
متن کاملPerformance of the 3D FFT on the 6D network torus QCDOC parallel supercomputer
QCDOC is a massively parallel supercomputer with tens of thousands of nodes distributed on a six-dimensional torus network. The 6D structure of the network provides the needed communication resources for many communication-intensive applications. In this paper, we present a parallel algorithm for three-dimensional Fast Fourier Transform and its implementation for a 4096-node QCDOC prototype. Tw...
متن کاملOverview of molecular dynamics techniques and early scientific results from the Blue Gene project
The Blue Genet project involves the development of a highly parallel supercomputer, the coding of scalable applications to run on it, and the design of protein simulations that take advantage of the power provided by the new machine. This paper provides an overview of analysis techniques applied to scientific results obtained with Blue Matter, the software framework for performing molecular dyn...
متن کاملMPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System
The Blue Gene/L supercomputer uses system-on-a-chip integration and a highly scalable 65,536 node cellular architecture to deliver 360 Teraflops of peak computing power. Efficient operation of the machine requires a fast, scalable and standards compliant MPI library. Researchers at IBM and Argonne National Labs are porting the MPICH2 library to Blue Gene/L . We present the current state of the ...
متن کاملParallel Matrix Multiplication: A Systematic Journey
We expose a systematic approach for developing distributed memory parallel matrix matrix multiplication algorithms. The journey starts with a description of how matrices are distributed to meshes of nodes (e.g., MPI processes), relates these distributions to scalable parallel implementation of matrix-vector multiplication and rank-1 update, continues on to reveal a family of matrix-matrix multi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IBM Journal of Research and Development
دوره 49 شماره
صفحات -
تاریخ انتشار 2005